123 research outputs found

    Modeling and Control of MapReduce Systems

    No full text
    posterInternational audienceSystems based on the MapReduce programming model are emerging as a central tool for deploying jobs that process large datasets in parallel. However the configuration of MapReduce systems is a complex process and at the moments it's left up to the user. These ad-hoc configuration methods make it difficult for small companies to take advantage of the growth of cloud computing solutions that provide resources as a service. Furthermore, the definition of SLAs becomes a complicated process for the user and the service provider as well. We propose a control theoretical approach to solving these problems. This implies the development of a general model that captures the dynamics of MapReduce systems. Finally, we intend to provide novel control methods that ease the configuration process and guarantee service level objectives such as constraints on system performance (execution times) and dependability (latency, availability) while optimizing resource consumption

    Adaptive Modelling and Control in Distributed Systems

    No full text
    International audienceCompanies have growing amounts of data to store and to process. In response to these new processing challenges, Google developed MapReduce, a parallel programming paradigm which is becoming the major tool for BigData treatment. Even if MapReduce is used by most IT companies, ensuring its performances while minimizing costs is a real challenge requiring a high level of expertise. Modelling and control of MapReduce have been developed in the last years, however there are still many problems caused by the software's high variability. To tackle the latter issue, this paper proposes an on-line model estimation algorithm for MapReduce systems. An adaptive control strategy is developed and implemented to guarantee response time performances under a concurrent workload while minimizing resource use. Results have been validated using a 40 nodes MapReduce cluster under a data intensive Business Intelligence workload running on Grid5000, a French national cloud. The experiments show that the adaptive control algorithm manages to guarantee performances and low costs even in a highly variable environment

    Self-Optimization of Internet Services with Dynamic Resource Provisioning

    Get PDF
    Self-optimization through dynamic resource provisioning is an appealing approach to tackle load variation in Internet services. It allows to assign or release resources to/from Internet services according to the varying load. However, dynamic resource provisioning raises several challenges among which: (i) How to plan a good capacity of an Internet service, i.e.~a necessary and sufficient amount of resource to handle the Internet service workload, (ii) How to manage both gradual load variation and load peaks in Internet services, (iii) How to prevent system oscillations in presence of potentially concurrent dynamic resource provisioning, and (iv) How to provide generic self-optimization that applies to different Internet services such as e-mail services, streaming servers or e-commerce web systems. This paper precisely answers these questions. It presents the design principles and implementation details of a self-optimization autonomic manager. It describes the results of an experimental evaluation of the self-optimization manager with a realistic e-commerce multi-tier web application running in a Linux cluster of computers. The experimental results show the usefulness of self-optimization in terms of end-user's perceived performance and system's operational costs, with a negligible overhead

    Application du contrôle pour garantir la performance des systèmes Big Data

    No full text
    International audienceNous sommes à l'aube d'une énorme explosion de données et la quantité à traiter par les entreprises est de plus en plus grande. Pour faire face à ce chalenge, Google a développé MapReduce, un modèle de programmation parallèle qui est en train de devenir l'outil de facto pour l'analyse des systèmes Big Data. Bien que dans une certaine mesure son utilisation est déjà très répandue dans l'industrie, garantir les performances d'un système aussi complexe pose de grands problèmes et sa gestion nécessite un haut niveau d'expertise. Cet article répond à ces défis en proposant le premier système autonome qui garantit des contraintes de temps de réponse pour une charge de travail MapReduce simultanée. Nous développons le premier modèle dynamique d'une grappe MapRe- duce. De plus, un contrôle en boucle fermée est conçu et implémenté pour garantir un temps de réponse donné. Un contrôle d'anticipation de type ""feedforward"" est également rajouté pour amé- liorer la réponse du système en présence de perturbations, en l'occurrence, la variation du nombre de clients. L'approche est validée en ligne sur une grappe MapReduce avec 40 nœuds utilisant une charge de travail intensive de type Business Intelligence. Nos expériences montrent que le contrôle ainsi conçu peut garantir les contraintes de temps de réponse

    Feedback Autonomic Provisioning for Guaranteeing Performance in MapReduce Systems

    No full text
    International audienceCompanies have a fast growing amounts of data to process and store, a data explosion is happening next to us. Currentlyone of the most common approaches to treat these vast data quantities are based on the MapReduce parallel programming paradigm.While its use is widespread in the industry, ensuring performance constraints, while at the same time minimizing costs, still providesconsiderable challenges. We propose a coarse grained control theoretical approach, based on techniques that have already provedtheir usefulness in the control community. We introduce the first algorithm to create dynamic models for Big Data MapReduce systems,running a concurrent workload. Furthermore we identify two important control use cases: relaxed performance - minimal resourceand strict performance. For the first case we develop two feedback control mechanism. A classical feedback controller and an evenbasedfeedback, that minimises the number of cluster reconfigurations as well. Moreover, to address strict performance requirements afeedforward predictive controller that efficiently suppresses the effects of large workload size variations is developed. All the controllersare validated online in a benchmark running in a real 60 node MapReduce cluster, using a data intensive Business Intelligenceworkload. Our experiments demonstrate the success of the control strategies employed in assuring service time constraints

    Adaptive Optimal Control of MapReduce Performance, Availability and Costs

    No full text
    International audienceMapReduce is a popular programming model for distributed data processing and Big Data applications running on clouds. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates an optimization-based solution to control MapReduce systems in order to provide guarantees in terms of both performance and availability while reducing utilization costs. We follow a control theoretical approach for MapReduce cluster scaling and admission control. Moreover, we aim to be robust to changes in MapRe-duce and in it's environment by adapting the controller online to those changes. This paper highlights the major challenges of combining system adaptation and optimal control to take the best of both approaches. CCS Concepts • Networks → Cloud computing; • Software and its engineering → Software configuration management and version control systems; • Computer systems organization → Dependable and fault-tolerant systems and networks

    Towards Control of MapReduce Performance and Availability

    No full text
    International audienceMapReduce is a popular programming model for distributed data processing and Big Data applications. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates a novel solution that controls MapReduce systems and provides guarantees in terms of both performance and availability, while reducing utilization costs. We follow a control theoretic approach for MapReduce cluster scaling and admission control. Preliminary results based on a simulation environment, previously validated on a real MapReduce cluster, show the effectiveness of the proposed control solutions for a Hadoop MapReduce cluster

    Cost Function based Event Triggered Model Predictive Controllers - Application to Big Data Cloud Services

    No full text
    International audienceHigh rate cluster reconfigurations is a costly issue in Big Data Cloud services. Current control solutions manage to scale the cluster according to the workload, however they do not try to minimize the number of system reconfigurations. Event-based control is known to reduce the number of control updates typically by waiting for the system states to degrade below a given threshold before reacting. However, computer science systems often have exogenous inputs (such as clients connections) with delayed impacts that can enable to anticipate states degradation. In this paper, a novel event-triggered approach is proposed. This triggering mechanism relies on a Model Predictive Controller and is defined upon the value of the optimal cost function instead of the state or output error. This controller reduces the number of control changes, in the normal operation mode, through constraints in the MPC formulation but also assures a very reactive behavior to changes of exogenous inputs. This novel control approach is evaluated using a model validated on a real Big Data system. The controller efficiently scales the cluster according to specifications, meanwhile reducing its reconfigurations

    CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions

    Get PDF
    By regularly querying Web search engines, users (unconsciously) disclose large amounts of their personal data as part of their search queries, among which some might reveal sensitive information (e.g. health issues, sexual, political or religious preferences). Several solutions exist to allow users querying search engines while improving privacy protection. However, these solutions suffer from a number of limitations: some are subject to user re-identification attacks, while others lack scalability or are unable to provide accurate results. This paper presents CYCLOSA, a secure, scalable and accurate private Web search solution. CYCLOSA improves security by relying on trusted execution environments (TEEs) as provided by Intel SGX. Further, CYCLOSA proposes a novel adaptive privacy protection solution that reduces the risk of user re- identification. CYCLOSA sends fake queries to the search engine and dynamically adapts their count according to the sensitivity of the user query. In addition, CYCLOSA meets scalability as it is fully decentralized, spreading the load for distributing fake queries among other nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real query and the fake queries separately, in contrast to other existing solutions that mix fake and real query results
    • …
    corecore